PGCon2017 - 20180510
PGCon 2017
The PostgreSQL Conference
Speakers | |
---|---|
Rafia Sabih | |
Robert Haas |
Schedule | |
---|---|
Day | Talks - Day 1 - 2017-05-25 |
Room | DMS 1110 |
Start time | 13:00 |
Duration | 00:45 |
Info | |
ID | 1059 |
Event type | Lecture |
Track | New Features |
Language used for presentation | English |
Parallel Query v2
The herd of elephants we unleashed claims new territory
PostgreSQL 9.6 has parallel query, but the only two operators available are Parallel Sequential Scan and Gather. A great deal of work has been done by many people to add additional parallel operators and remove limitations for PostgreSQL 10, expanding the range of queries that can benefit from parallelism.
First, parallelism has now been extended to other scan methods — index scan, index-only scan, and bitmap heap scan. With these new operators, users can leverage the benefits of parallelism even when the selectivity over base relations is low. Second, several patches have been proposed to improve the parallelization of joins. Parallel merge-join makes parallel index scans and parallel index-only scans much more appealing, while parallel hash joins with a shared hash table allow both the build table and probe table to be scanned in parallel and reduce system resource utilization as well. In addition,the new Gather Merge operator maintains the sort order of results generated by workers, allowing aggregates at higher levels of the plan tree to be more efficient and, in some cases, allowing more of the work of a query to be done in parallel. The new Parallel Append operator allows multiple branches of an inheritance hierarchy or UNION ALL query to be executed in parallel. Finally, considerable work has been done work to relax various restrictions related to subqueries, including allowing parallel query to work even for relations with attached InitPlan and SubPlan nodes.
We analyzed the cumulative effect of all these parallel operators using the TPC-H queries, a standard decision support benchmark. We experimented primarily with database sizes of 20GB and 300GB with various different parameter settings, and will report benchmarking results and possible next steps during this presentation.